Cyanobacterial α-carboxysome carbonic anhydrase is allosterically regulated by the Rubisco substrate RuBP

Cyanobacterial CO2 concentrating mechanisms (CCMs) sequester a globally consequential proportion of carbon into the biosphere. Proteinaceous microcompartments, called carboxysomes, play a critical role in CCM function, housing two enzymes to enhance CO2 fixation: carbonic anhydrase (CA) and Rubisco. Despite its importance, our current understanding of the carboxysomal CAs found in α-cyanobacteria, CsoSCA, remains limited, particularly regarding the regulation of its activity. Here, we present a structural and biochemical study of CsoSCA from the cyanobacterium Cyanobium sp. PCC7001. Our results show that the Cyanobium CsoSCA is allosterically activated by the Rubisco substrate ribulose-1,5-bisphosphate and forms a hexameric trimer of dimers. Comprehensive phylogenetic and mutational analyses are consistent with this regulation appearing exclusively in cyanobacterial α-carboxysome CAs. These findings clarify the biologically relevant oligomeric state of α-carboxysomal CAs and advance our understanding of the regulation of photosynthesis in this globally dominant lineage.


Analytical size exclusion chromatography
A HiLoad 16/600 Superose 6pg preparative size exclusion chromatography (SEC) column was calibrated using the Cytiva Gel Filtration Calibration HMW kit (product code 28403842) as per manual specifications (28951560 AG).The partition coefficient (Kav) of each protein was used to construct a calibration curve of Kav versus log(molecular mass).This was calculated using the equation Kav = (ve -vo)/(vc -vo), where ve is the elution volume, vo is the column void volume, and vc is the geometric column volume.These were used for comparison of elution columns and estimation of theoretical molecular masses.
Native PAGE Protein samples were stored in Native Gel-Loading buffer (2.5 x TBE, 50% Glycerol, 0.1% Bromophenol blue) and loaded on 4-20% Mini-PROTEAN TGX Stain-free polyacrylamide gels (Bio-Rad, Cat.No. 4568096).Proteins were separated at 90 V for 90 minutes at 4 o C in native running buffer (25 mM Tris [pH 8.3], 50 mM glycine).NativeMark protein standards were used for molecular weight determination (Invitrogen, LOT 1739010).To visualise proteins, gels were stained with Coomassie Blue dye (BioRad, USA).To determine how essential the observed zinc ions in the CyCsoSCA structure are for hexamer formation, the protein was treated with a range of conditions to interrupt zinc binding, the oligomeric state of resulting samples was then assessed by Native PAGE.CyCsoSCA was dialysed with 2mM 1,10-phenanthroline, a strong chelating agent, for 24 hours according to previous publications 76 .Additionally, protein samples were incubated in SEC buffer at a pH of 3.5, 4, 5, 6, 7, or 8 for 1 hour.All samples were stored in Native Gel-Loading buffer (2.5 x TBE, 50% Glycerol, 0.1% Bromophenol blue) and loaded on 4-20% Mini-PROTEAN TGX Stain-free polyacrylamide gels (Bio-Rad, Cat.No. 4568096).Proteins were separated at 90 V for 90 minutes at 4 o C in native running buffer (25 mM Tris [pH 8.3], 50 mM glycine).NativeMark protein standards were used for molecular weight determination (Invitrogen, LOT 1739010).
To visualise proteins, gels were stained with Coomassie Blue dye (BioRad, USA).     Figure S5 The oxidant DTNB does not affect RuBP activation of CsoSCA.CA activity was measured using the MIMS method in clarified lysate preparations of E. coli overexpressing CsoSCA extract before (-DTNB) and after incubation with 30µM DTNB (+DTNB) before (CsoSCA) and after addition of 100 µM RuBP (CsoSCA + RuBP).CA activity is reported as a proportion of the maximum recorded CA stimulation rate 7 .Values are means ± standard error of four technical replicates.RuBP dependent CA activity showed no statically significant differences between DTNB-treated and untreated CsoSCA extracts, as determined by a oneway ANOVA followed by Turkey's Honestly Significant Difference (HSD) test, p value between 'CsoSCA' samples with and without DTNB was calculated as 0.999 and for 'CsoSCA + RuBP' samples 0.981.PCA plots show that relative to apoprotein trajectories, the holoprotein occupies a much narrower conformational space, indicative of more constrained conformational sampling.This corresponds with previous publications that conclude ligand binding stabilises the holoprotein, resulting a less flexibility relative to the apoprotein 79,80 .In samples incubated at pH 5, the dimeric band increases in intensity and a second band is evident between 242 kDa and 480 kDa consistent with the hexameric CyCsoSCA (312 kDa).As the pH becomes more basic the hexameric band increases in intensity and the dimeric band fades.This would correspond with the hexamer destabilising as the Histidines coordinating the structural Zinc ions are protonated, and dimers become the prevailing quaternary state in solution.At pH 6 and 7 a second faint band is evident above the dimeric band between the 66kDa and 146kDa markers.The distance between the two bands here is too small to denote a trimer thus it is likely this is some kind of contaminant.This data is consistent with the NTD acting as an oligomerisation domain, binding structural Zincs to facilitate the formation of large hexameric CsoSCA assemblies in solution.The standard is labelled (L) with corresponding weights annotated in kDa.301 Table S5

Figure S1 .
Figure S1.CsoSCA homologues contain an extended disordered tail that has been truncated in this study to mitigate aggregation.The N-terminal disordered tail binds to Rubisco to enable CsoSCA encapsulation during α-carboxysome biogenesis 30 .In the

Figure
Figure S2 Comparison of αFold2 models of full length CyCsoSCA to the crystalised truncated form.A Alphafold2 monomer (orange) and chain A of 8thm structures aligned (Cα RMSD 0.541 A).Catalytic zinc included for reference as a grey sphere, no notable deviations noted.B AlphaFold2 multimer model of full length CyCsoSCA sequences in a dimer aligned to 8thm chain A-B dimer (Cα RMSD 1.214).No major structural deviations evident between model and crystal structure.C The modelled dimer (orange) with the extended disordered Nterminal aligned with the complete CyCsoSCA hexamer observed in the crystal structure.The disordered termini, while long is flexible and thus should not sterically hinder the formation of the hexameric complex.One can imagine the increased density of disordered strands due to hexamer assembly may notably increase the propensity for aggregation relative to solutions in which the dimer dominates.

FigureFigure S4 .
Figure S3 Density maps of RuBP in each monomer in the CyCsoSCA structure characterised in the main text.At least two sulfate ions are observed at this site in all monomers from the crystallisation solvent.Sulfate ions have been observed bound to β-CA oligomeric interfaces, though rarely at pockets buried so deep as observed here 27 .
data for wild type CyCsoSCA, HnCsoSCA and plotted in Figure 1D 114 and Figure 3D of the main text.Continued on the following page.

Figure S6 .
Figure S6.The RuBP binding site is distinct from previously identified allosteric bicarbonate site documented in Type II β-CAs, typified by the Haemophilus influenzae (HICA; PDB:2A8D) 36,77 .A The HICA dimer shown with monomers in different hues of purple and the CyCsoSCA monomer (chain D) in green.The allosteric ligands are indicated by a magenta circle and the active zinc residues by a red circle in each structure.Type II β-CAs exhibit a pH-dependent activity profile symptomatic of an allosteric bicarbonate inhibition mechanism.Broadly, bicarbonate binding causes a conformational shift that disrupts the conserved Asp-Arg active site dyad, causing the Asp to coordinate the zinc ion and displace the catalytically essential water molecule 36,45,77 .The allosteric RuBP (CyCsoSCA) and bicarbonate ions (HICA) are shown in ball and stick representation.Zinc ions are shown as spheres.B A structural alignment of the HICA dimer and the CyCsoSCA monomer is shown with a box highlighting the zinc active sites and ligand binding pockets.C A box of the overlayed zinc ions and allosteric ligands.The RuBP ligand associated with CyCsoSCA and bicarbonate ions (BCT) associated with HICA are shown in ball and stick representation and annotated.In the box below, ligand binding residues are shown in stick representation, purple residues correspond to sites in the HICA structure and green residues those in the CyCsoSCA structure.The catalytic zinc binding residues C42, D44, H98 and C101 are shown the HICA monomer closest to the CyCsoSCA RuBP binding site.Though in a similar region of the protein, the two sites employ different residues and the RuBP sits much further from the active

Figure S7 .
Figure S7.Given the overlaying positions of RuBP and sulfate ions, we hypothesised sulfate may compete with RuBP at this site, explaining low ligand density despite high concentrations in crystallisation conditions.CyCsoSCA activity in the presence of sulfate and phosphate ions was quantified using MIMS 7 as detailed in the main text.All assays are run in the presence of 20mM EPPS.Activity in the absence of any salt (20mM EPPS) is shown, highlighting the requirement for Mg 2+ to achieve maximal activity (50mM MgCl2).Activity levels at this basal condition are indicated as single points/dashed horizontal lines.CA activity detected at increasing concentrations of key salts MgSO4 and Na2HPO4 as shown in the figure legend are plotted.These results show CyCsoSCA is significantly inhibited at 50mM MgSO4 relative to standard assay conditions (20mM MgCl2).Phosphate ions, a more biologically relevant ion with analogous size and charge, had a comparable effect to sulfate.CyCsoSCA activity also dropped in the absence of Magnesium salts, perhaps functioning to stabilise the ligand in solution.It is unclear whether phosphate would ever reach millimolar concentrations within the carboxysome in vivo.All data points are the mean of three replicates, error bars denote standard error of the mean.

Figure S8 .
Figure S8.Trajectory analysis of molecular dynamic simulations of CyCsoSCA Chain D with (holoenzyme) and without (apoenzyme) RuBP present.A Cα RMSD (Å) variation across trajectory for each 300ns replicate relative to frame 1.Five replicates were conducted for each condition with frame recording intervals of 15ns, error indicates standard deviation.Both trajectories equilibrated, with a few fluctuations evident in the apoprotein trajectories particularly after 250ns.B Cα RMSF (Å) of each residue across the trajectory highlighting marginal site-by-site differences.C Histogram recording the difference between RMSF of analogous residues.Average RMSF (Å) was calculated for each residue and values from the apoenzyme trajectory were subtracted from the average RMSF value of the corresponding residue in the holoenzyme trajectory.Thus, a negative value is indicative of a residue with a greater RMSF in the apoenzyme trajectory and vice versa.These values have then been plotted onto the structure.Principal component analysis of cartesian coordinates of either holoprotein replicates (D) or apoprotein replicates (E) coloured by replicate (left) or by time (right).These

Figure
Figure S9 Additional molecular dynamics replicates with an alternate water model (TIP4P) compared to the existing five replicates.A Cα RMSD (Å) variation across trajectory for each 300ns replicate relative to frame 1 as presented in Fig S8A (holoprotein in green, apoprotein in black), the mean is plotted with standard error indicated by shading.Single trajectories with alternate water models of the apoprotein (red) and holoprotein (blue) are overlayed.B Principal component analyses (PCAs) of the holoprotein (left panel) and apoprotein (right panel) replicates with the relevant alternate water molecule simulations overlayed.
to taxa as per key in the legend.A complete treefile, multiple sequence 210 alignment, and final sequence annotations are provided as supplementary datafiles.

Figure S11 .
Figure S11.The RuBP pocket is conserved and positively charge in cyanobacterial CsoSCA variants.CyCsoSCA and HnCsoSCA structures coloured by electrostatic charge and residue conservation as per legends in the figure.From top to bottom, the dimer of each variant is shown, followed by a zoomed in box of the RuBP binding site, and then a view of the ligand pocket in the monomer.Given RuBP primarily exists in a negatively charged state 35 , binding site charge was of key interest.In CyCsoSCA, this site is positively charged, while the comparative region in the constitutively active HnCsoSCA has a negative electrostatic

Figure S12 .
Figure S12.A model adapted from previous work simulating carboxysome function was adapted to emulate a Cyanobium α-carboxysome with either an RuBP-dependent CA (Modified CA, green dots) or a constitutively active CA (Unmodified CA, black dots) 35 .A Rubisco oxygenation activity per second (Rubisco (kcato) is equivalent between the two systems across modelled cellular RuBP concentrations (Ext RuBP (mM)), corresponding to the equivalent observed carboxylation rates (Figure 5).B The concentration of HCO3 -and C CO2 within the carboxysome as a function of modelled cellular RuBP concentrations (Ext RuBP (mM)).D Correspondingly, the concentration of modelled cellular PGA (Ext PGA (mM)) as a function of modelled cellular RuBP concentrations (Ext RuBP (mM)) demonstrates no difference in PGA production between the two systems.E While the carboxysomal CA is capable of both the forward and reverse CO2 hydration/HCO3 -dehydration reactions, to supply the carboxysome-encased Rubisco with CO2 requires the dehydration reaction to predominate.The ratio of the fluxes recorded for the CA dehydration to hydration reaction (CA flux (Dehyd/Hyd)) is plotted here across a modelled cellular RuBP gradient (Ext RuBP (mM)).This demonstrates notable differences in CA flux and at low RuBP conditions in the two systems.As Ci appears consistent between each system, this likely manifests as changes in the carboxysomal proton concentration at these RuBP levels.Indeed, this is consistent with the observed changes in pH presented in Figure 5C.F The ratio of Rubisco carboxylation rates to oxygenation rates (Rubisco C/O) as CA activity (CA flux, indicative of CA activity levels converting HCO3 -to CO2) increases with cellular RuBP concentrations in the modified and unmodified systems.

Figure
Figure S13.A homohexameric trimer of dimers is the prevailing, biologically relevant form of CsoSCA.A Size exclusion chromatography of CyCsoSCA and HnCsoSCA shows each isoform elutes at approximately the volume expected for the hexamer observed in the CyCsoSCA crystal structure (approx.312 kDa) independent of protein concentration.An additional peak in the CyCsoSCA samples at both concentrations is also seen at slightly higher molecular weights suggesting, while the hexamer is the predominant form, a spectrum of oligomeric states may be present in solution.B A sequence logo with the key His155 residue required by all monomers to coordinate the structural zinc ion in an octahedral His3(H2O)3 coordination sphere (Figure 1 main text).The key zinc coordination residue appears conserved across all analysed CsoSCA sequences.C To assess the role of the zinc ion in coordinating the hexamer, we perturbed zinc binding and observed the effect on oligomerisation.Native CyCsoSCA (N) and CyCsoSCA dialysed against 2mM 1,10-phenanthroline, a strong chelating agent, for 24 hours before being run on a Native PAGE to compare oligomeric states.This results in the breakdown of the high molecular weight band corresponding to the hexameric state, in favour of a smaller band approximately the size expected for the dimer.Additionally, CyCsoSCA was incubated across a pH gradient and subsequently assessed by Native PAGE to capture pH dependent perturbations in oligomeric state.as solution pH approaches the pKa of Histidine a corresponding oligomeric shift is observed.At pH 3.5 smearing is indicative of protein denaturation and at pH 4 a faint band is visible between 146 kDa and 66 kDa consistent with the dimeric CyCsoSCA (104 kDa).In samples incubated at pH 5, the dimeric band

Figure S14 .
Figure S14.The opportune zinc site in the NTD of the crystallised HnCsoSCA previously characterised arose from an inadvertent mutant and likely precluded the formation of the true hexameric state 27 .The HnCsoSCA dimer (PDB:2FGY) is shown (purple) overlaid with the CyCsoSCA structure solved here (green) with zinc ligands shown as spheres (grey spheres for CyCsoSCA associated Zincs and purple spheres for those associated with HnCsoSCA).A box with a zoomed in view of the structural zinc site is shown with key zinc binding ligands in CyCsoSCA (His155) and HnCsoSCA (His92, Asp115, His121) shown in stick representation.The purple sphere shown here depicts the non-biologically relevant zincbinding site in the NTD while the grey sphere denotes the additional zinc ion demonstrated to mediate additional contacts necessary for hexamer formation.

Figure S15 .
Figure S15.Example of operon types observed in the CsoSCA dataset.For each sequence in the 'non-cso' cluster the genomic context was pulled using the JGI IMG database and manually assessed.Canonical HnCsoSCA and CyCsoSCA demonstrate typical cso operon structure.The colours of each gene are automatically generated on JGI IMG based on different COG annotations.